Fix AsyncioConnection race conditions causing EBADF errors (#614)#697
Fix AsyncioConnection race conditions causing EBADF errors (#614)#697dkropachev wants to merge 1 commit intomasterfrom
Conversation
46524ac to
7b815c8
Compare
2b9d555 to
85d08e1
Compare
85d08e1 to
776d2d3
Compare
Fix race conditions in AsyncioConnection that cause "[Errno 9] Bad file descriptor" errors during node restarts, especially with TLS: 1. close() now waits for _close() to complete when called from outside the event loop thread, eliminating the window where is_closed=True but the socket fd is still open. 2. handle_read() sets last_error on server EOF so factory() detects dead connections instead of returning them to callers. 3. handle_write() treats peer disconnections as clean close instead of defuncting, and both I/O handlers skip defunct() if the connection is already shutting down. Peer-disconnect detection is extracted into _is_peer_disconnect() helper covering platform-specific behaviors: - Windows: ProactorEventLoop raises plain OSError with winerror 10054 (WSAECONNRESET) or 10053 (WSAECONNABORTED) instead of ConnectionResetError. Detection uses ConnectionError base class plus winerror check. - macOS: Raises OSError(57) ENOTCONN when writing to a peer-disconnected socket, which is not a ConnectionError subclass. Detection uses errno-based checks for ENOTCONN, ESHUTDOWN, ECONNRESET, and ECONNABORTED. - Windows _close(): ProactorEventLoop does not support remove_reader/remove_writer (raises NotImplementedError). These calls are wrapped so the socket is always closed regardless, and try/finally ensures connected_event is always set even if cleanup fails.
d73e85d to
9915064
Compare
|
@dkropachev would recommand testing SSL with this fix, we know SSL was failing/stuck with asyncio connection, hence all the fuss with it being a default or not |
It was not failing because of race conditions, but because setting up SSL socket has a totally different API in asyncio, and it was just never implemented here. |
| try: | ||
| self._socket.close() | ||
| except OSError: | ||
| # Ignore if socket is already closed | ||
| pass |
There was a problem hiding this comment.
Why would the socket be already closed? self._socket.close() is only called in this one place. _close is only called from close, which checks and sets is_closed, so _close has no way of running multiple times.
| except Exception: | ||
| # It is not critical if it fails, driver can keep working, | ||
| # but it should not be happening, so logged as error | ||
| log.error("Unexpected error removing reader for %s", | ||
| self.endpoint, exc_info=True) | ||
|
|
There was a problem hiding this comment.
In previous version you were catching OsError, and explained that it is thrown when the socket is closed. Again, I don't see how this is possible, because the only place where the socket is being closed is after this code.
| fd = self._socket.fileno() | ||
| if fd >= 0: |
| # Errno values that indicate the remote peer has disconnected. | ||
| _PEER_DISCONNECT_ERRNOS = frozenset(( | ||
| errno.ENOTCONN, errno.ESHUTDOWN, | ||
| errno.ECONNRESET, errno.ECONNABORTED, | ||
| errno.EBADF, | ||
| )) |
There was a problem hiding this comment.
Won't it be a ConnectionError in all those cases?
| if self.is_closed or self.is_defunct: | ||
| raise ConnectionShutdown( | ||
| "Connection to %s is already closed" % self.endpoint) |
There was a problem hiding this comment.
As far as I can tell, push is only called from Connection::send_msg. send_msg already starts with:
if self.is_defunct:
raise ConnectionShutdown("Connection to %s is defunct" % self.endpoint)
elif self.is_closed:
raise ConnectionShutdown("Connection to %s is closed" % self.endpoint)
elif not self._socket_writable:
raise ConnectionBusy("Connection %s is overloaded" % self.endpoint)
So what does this check give us?
Summary
close()to wait for_close()completion from non-event-loop threads, eliminating the window whereis_closed=Truebut the socket fd is still open (root cause of EBADF)last_erroron server EOF inhandle_read()sofactory()detects dead connections instead of returning themis_closed/is_defunctguard inpush()to reject writes on closed connectionsBrokenPipeError/ConnectionResetErrorinhandle_write()as clean peer disconnections instead of defuncting, and skipdefunct()in both I/O handlers if the connection is already shutting downTest plan
tests/unit/io/test_asyncio_race_614.pycover all four race conditions using real socket pairs